CURIO: A Fast Outlier and Outlier Cluster Detection Algorithm for Large Datasets
نویسندگان
چکیده
Outlier (or anomaly) detection is an important problem for many domains, including fraud detection, risk analysis, network intrusion and medical diagnosis, and the discovery of significant outliers is becoming an integral aspect of data mining. This paper presents CURIO, a novel algorithm that uses quantisation and implied distance metrics to provide a fast algorithm that is linear for the number of objects and only requires two sequential scans of disk resident datasets. CURIO includes a novel direct quantisation technique and the explicit discovery of outlier clusters. Moreover, a major attribute of CURIO is its simplicity and economy with respect to algorithm, memory footprint and data structures.
منابع مشابه
Cell-DROS: A Fast Outlier Detection Method for Big Datasets
Outlier detection is one of the obstacles of big dataset analysis because of its time consumption issues. This paper proposes a fast outlier detection method for big datasets, which is a combination of cell-based algorithms and a ranking-based algorithm with various depths. A cell-based algorithm is proposed to transform a very large dataset to a fairly small set of weighted cells based on pred...
متن کاملA Fast Greedy Algorithm for Outlier Mining
The task of outlier detection is to find small groups of data objects that are exceptional when compared with rest large amount of data. Recently, the problem of outlier detection in categorical data is defined as an optimization problem and a local-search heuristic based algorithm (LSA) is presented. However, as is the case with most iterative type algorithms, the LSA algorithm is still very t...
متن کاملA Spectral Clustering Based Outlier Detection Technique
Outlier detection shows its increasingly high practical value in many application areas such as intrusion detection, fraud detection, discovery of criminal activities in electronic commerce and so on. Many techniques have been developed for outlier detection, including distribution-based outlier detection algorithm, depth-based outlier detection algorithm, distance-based outlier detection algor...
متن کاملIdentification of outliers types in multivariate time series using genetic algorithm
Multivariate time series data, often, modeled using vector autoregressive moving average (VARMA) model. But presence of outliers can violates the stationary assumption and may lead to wrong modeling, biased estimation of parameters and inaccurate prediction. Thus, detection of these points and how to deal properly with them, especially in relation to modeling and parameter estimation of VARMA m...
متن کاملA Study of Clustering Based Algorithm for Outlier Detection in Data streams
Recently many researchers have focused on mining data streams and they proposed many techniquesand algorithms for data streams. It refers to the process of extracting knowledge from nonstop fast growing data records. They are data stream classification, data stream clustering, and data stream frequentpattern items and so on. Data stream clustering techniques are highly helpful to cluster the si...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007